Classifying negative and positive points by optimal box clustering
نویسنده
چکیده
In this paper we address the problem of classifying positive and negative data with the technique known as box clustering. A box is homogeneous if it contains only positive (negative) points. Box clustering means finding a family of homogeneous boxes jointly containing all and only positive (negative) points. We first consider the problem of finding a family with the minimum number of boxes. Then we refine this problem into finding a family which not only consists of the minimum number of boxes but also the points are covered as many times as possible by the boxes in the family. We call this problem the maximum redundancy problem. We model both problems as set covering problems with column generation. The pricing problem is a maximum box problem. Although this problem is NP-hard, there is available in the literature a combinatorial algorithm which performs well. Since the pricing has to be carried out also in the branch-and-bound search of the set covering problem we consider also how the pricing has to be modified to take care of the branching constraints. The computational results show a good behavior of the set covering approach.
منابع مشابه
Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملFuzzy based efficient drone base stations (DBSs) placement in the 5G cellular network
Currently, cellular networks are one of the essential communication methods for people. Providing proper coverage for the users and also offering high-quality services to them are two of the most important issues of concern in cellular networks. The fifth-generation cellular communication networks can provide higher data transmission rates, which lead to a higher quality of service but this hig...
متن کاملGenetic Algorithm and Confusion Matrix for Document Clustering
Text mining is one of the most important tools in Information Retrieval. Text clustering is the process of classifying documents into predefined categories according to their content. Existing supervised learning algorithms to automatically classify text requires sufficient documentation to learn exactly. In this paper, Niching memetic algorithm and Genetic algorithm (GA) is presented in which ...
متن کاملNew spatial clustering-based models for optimal urban facility location considering geographical obstacles
The problems of facility location and the allocation of demand points to facilities are crucial research issues in spatial data analysis and urban planning. It is very important for an organization or governments to best locate its resources and facilities and efficiently manage resources to ensure that all demand points are covered and all the needs are met. Most of the recent studies, which f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Discrete Applied Mathematics
دوره 165 شماره
صفحات -
تاریخ انتشار 2014